1 Getting the Data

To get information about the job titles and salaries, I downloaded the data from the Ohio Checkbook on March 1, 2024 at 3:50 PM. The data is stored in the file data/EmployeePayroll.csv. In the code chunk below, I:

  • read the data and cleaned it up;
  • counted the number of employees in each job title; and then
  • saved the result in data/salaries_tbl.csv.

Note that there is a total of 221 unique job titles at the Ohio Department of Transportation, based on the data downloaded from the Ohio Checkbook on March 1, 2024 at 3:50 PM.

salaries = readr::read_csv("../data/EmployeePayroll.csv") |> 
  janitor::clean_names()
## Rows: 5546 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Name, Job Title, Agency, Max. Hourly Rate, Amount
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
salaries = salaries |>
  dplyr::mutate(
    max_hourly_rate = stringr::str_replace_all(max_hourly_rate, "\\$|,", "") |> as.numeric(),
    amount = stringr::str_replace_all(amount, "\\$|,", "") |> as.numeric()
  )

position_tables = table(salaries$job_title)

salaries_tbl = tibble::tibble(
  index = 1:length(position_tables),
  job_title = names(position_tables),
  count = position_tables |> as.integer()
)

readr::write_csv(salaries_tbl, "../data/salaries_tbl.csv")

DT::datatable(
  salaries_tbl |> dplyr::select(job_title, count), 
  rownames = FALSE, 
  extensions = 'Buttons',
  options = list(
    pageLength = 15, scrollX = TRUE, autoWidth = TRUE,
    dom = 'Bfrtip',
    buttons = c('copy', 'csv', 'excel', 'pdf', 'print') 
    )
)

2 Projected Tasks for Each Job Title and how they would be Impacted by AI

2.1 Using OpenAI API to Generate Task Descriptions and Exposure Scores

Due to the lack of Ohio Department of Transportation (ODOT) data in the proposal stage, we will utilize the OpenAI API to perform the following four tasks for each job title in the dataset:

  1. Generate a 2-3 sentence description of a singular main task performed by the person with this title given that they work for the Ohio Department of Transportation. While they may be performing many tasks, provide the task that would be most representative of that job title.
  2. Reason step by step to decide which of the labels (E0/E1/E2/E3) from the exposure rubric you were given applies to the task’s exposure to LLM. Report. Give an explanation for which label applies and report the label that you think fits best. Do not say zero or N/A. Note that the labels correspond to:
    • E0: Used for tasks where LLMs and related tools don’t reduce completion time by half, including physically interactive tasks, precise measurement, detailed visual review, and tasks requiring legal human execution.
    • E1: Used for tasks where direct access to an LLM like ChatGPT halves completion time, including text/code writing, editing, summarizing documents, answering questions, email responses, and creating training materials.
    • E2: Used for tasks where LLMs don’t reduce completion time by half, but additional software could be developed to do so, including summarizing long documents, retrieving facts, searching knowledge, making recommendations, and maintaining complex databases.
    • E3: Used for tasks where LLMs and image-related tools reduce completion time by half, including reading text from PDFs, scanning images, and creating/editing digital images.
  3. Report only the label that you determined for the task, which should match the label in step 1. Do not reply N/A.
  4. Given the amount of speculation required in step 1, describe your certainty about the estimate–either high, moderate, or low. Ensure that your response is consistent with the provided descriptions, seperating each of the four things with two new lines. Each of the things must start with its number followed by a colon and a space. For example, ‘1: This is the first thing.’ Do not return * as part of your response. Stick to the guidelines as the response will be parsed into four seperate columns.

Note that our approach builds on the work of Eisfeldt et al. (2023). Generative AI and Firm Values. There are four main differences between our work for this proposal and the original paper:
- Conceptual/Design:
+ We did not have task descriptions for each job title, so we also generated them using the OpenAI API. + We focused on the main task performed by the person with a specific job title. If we are funded*, we will assess multiple task descriptions for a given role, and we will ask ODOT to provide these task descriptions.
- Technical:
+ We used the latest OpenAI model, GPT-4 Turbo, to generate the exposure scores.
+ We also used the ChatCompletion API instead of the legacy completions API.

# Loading the required libraries:
# ------------------------------
import time

import pandas as pd
import datetime as dt
import numpy as np
import openai # pip install openai==0.28

from dotenv import load_dotenv


# Loading the environment variable:
# ---------------------------------
load_dotenv() # loads the OpenAI key from the .env file
## True
openai.api_key = os.getenv('OPENAI_API_KEY')


# Model to be used:
# -----------------
model_name = 'gpt-4-turbo-preview' # latest_model_name


# Task Data and Rubric:
# ---------------------
task = pd.read_csv('../data/salaries_tbl.csv') # read the entire dataset
task.rename(columns = {'job_title': 'occupation'}, inplace = True) # to reduce changes from paper below

# AI Firm Exposure Analysis based on
# https://www.nber.org/system/files/working_papers/w31222/w31222.pdf :
# ----------------------------------------------------------------------

rubric = {
    'E1': 'Label tasks E1 if direct access to the LLM through an interface like ChatGPT or the OpenAI playground alone can reduce the time it takes to complete the task with equivalent quality by at least half. This includes tasks that can be reduced to: - Writing and transforming text and code according to complex instructions, - Providing edits to existing text or code following specifications, - Writing code that can help perform a task that used to be done by hand, - Translating text between languages, - Summarizing medium-length documents, - Providing feedback on documents, - Answering questions about a document, - Generating questions a user might want to ask about a document, - Writing questions for an interview or assessment, - Writing and responding to emails, including ones that involve refuting information or engaging in a negotiation (but only if the negotiation is via written correspondence), - Maintain records of written data, - Prepare training materials based on general knowledge, or - Inform anyone of any information via any written or spoken medium.',
    'E2': 'Label tasks E2 if having access to the LLM alone may not reduce the time it takes to complete the task by at least half, but it is easy to imagine additional software that could be developed on top of the LLM that would reduce the time it takes to complete the task by half. This software may include capabilities such as: - Summarizing documents longer than 100,000 words and answering questions about those documents, - Retrieving up-to-date facts from the Internet and using those facts in combination with the LLM capabilities, - Searching over an organization’s existing knowledge, data, or documents and retrieving information, - Retrieving highly specialized domain knowledge, - Make recommendations given data or written input, - Analyze written information to inform decisions, - Prepare training materials based on highly specialized knowledge, - Provide counsel on issues, and - Maintain complex databases.',
    'E3': 'Suppose you had access to both the LLM and a system that could view, caption, and create images as well as any systems powered by the LLM (those in E2 above). This system cannot take video as an input and it cannot produce video as an output. This system cannot accurately retrieve very detailed information from image inputs, such as measurements of dimensions within an image. Label tasks as E3 if there is a significant reduction in the time it takes to complete the task given access to a LLM and these image capabilities: - Reading text from PDFs, - Scanning images, or - Creating or editing digital images according to instructions. The images can be realistic but they should not be detailed. The model can identify objects in the image but not relationships between those options',
    'E0': 'Label tasks E0 if none of the above clearly decrease the time it takes for an experienced worker to complete the task with high quality by at least half. Some examples: - If a task requires a high degree of human interaction (for example, in-person demonstrations) then it should be classified as E0. - If a task requires precise measurements then it should be classified as E0. - If a task requires reviewing visuals in detail then it should be classified as E0. - If a task requires any use of a hand or walking then it should be classified as E0. - Tools built on top of the LLM cannot make any decisions that might impact human livelihood (e.g.hiring, grading, etc.). If any part of the task involves collecting inputs to make a final decision (as opposed to analyzing data to inform a decision or make a recommendation) then it should be classified as E0. The LLM can make recommendations. - Even if tools built on top of the LLM can do a task, if using those tools would not save an experienced worker significant time completing the task, then it should be classified as E0. - The LLM and systems built on top of it cannot do anything that legally requires a human to perform the task. - If there is existing technology not powered by an LLM that is commonly used and can complete the task then you should mark the task E0 if using an LLM or LLM-powered tool will not further reduce the time to complete the task. When in doubt, you should default to E0.'
}

# Slightly Modified System Prompt:
# -----------------------------------
system_prompt = f"""
Consider the most powerful OpenAI large language model (LLM). This model can complete many tasks that can be formulated as having text input and text output where the context for the input can be captured in 100,000 words. The model also cannot draw up-to-date facts (those from <1 year ago) unless they are captured in the input. Assume you are a worker with an average level of expertise in your role trying to complete the given task. You have access to the LLM as well as any other existing software or computer hardware tools mentioned in the task. You also have access to any commonly available technical tools accessible via a laptop (e.g. a microphone, speakers, etc.). You do not have access to any other physical tools or materials. You are a helpful research assistant who wants to label the given tasks according to the rubric below. Equivalent quality means someone reviewing the work would not be able to tell whether a human completed it on their own or with assistance from the LLM. If you aren’t sure how to judge the amount of time a task takes, consider whether the tools described exposed the majority of subtasks associated with the task.

Rubric:
{rubric}

Your role:
You will be provided with an occupation (job title). Then, you must do four things. 1: Generate a 2-3 sentence description of a singular main task performed by the person with this title given that they work for the Ohio Department of Transportation. While they may be performing many tasks, provide the task that would be most representative of that job title. 2: Reason step by step to decide which of the labels (E0/E1/E2/E3) from the exposure rubric you were given applies to the task’s exposure to LLM. Report. Give an explanation for which label applies and report the label that you think fits best. Do not say zero or N/A. 3: Report only the label that you determined for the task, which should match the label in step 1. Do not reply N/A. 4: Given the amount of speculation required in step 1, describe your certainty about the estimate–either high, moderate, or low. Ensure that your response is consistent with the provided descriptions, seperating each of the four things with two new lines. Each of the things must start with its number followed by a colon and a space. For example, '1: This is the first thing.' Do not return * as part of your response. Stick to the guidelines as the response will be parsed into four seperate columns.
"""


# Function to Score the Exposure:
# -------------------------------
def score_exposure_and_explanation(occupation, rubric, system_prompt):
  
  prompt = f"\nOccupation: {occupation}\nRubric:{rubric}\nSystem Prompt{system_prompt}"
  
  # updated the response to use the ChatCompletion API instead of the legacy
  # completions API (that is why the previous code did not work)
  response = openai.ChatCompletion.create(
    model=model_name, 
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt}
    ],
    temperature=0,
    max_tokens=500
    )
  
  # note also the explanation is now a bit different
  explanation = response.choices[0].message['content'].strip()
  
  return explanation

# Iterate through each occupation in our dataset and score the exposure:
# ----------------------------------------------------------------------
for index, row in task.iterrows():
    time.sleep(2) # to avoid rate limiting
    
    occupation = row['occupation']

   # Score the exposure level using the defined function
    try:
        gpt_response = score_exposure_and_explanation(occupation, rubric, system_prompt)
    except Exception as e:
        print(f"Error occurred while generating explanation for occupation '{occupation}': {str(e)}")
        gpt_response = "Error occurred while generating explanation"
    
    # Store the model, timestamp, explanation in the dataset
    task.at[index, 'gpt_model'] = model_name
    task.at[index, 'gpt_response_timestamp'] = dt.datetime.now()
    task.at[index, 'gpt_response'] = gpt_response
## Error occurred while generating explanation for occupation 'Sign Fabrication Superintenden': HTTP code 502 from API (<html>
## <head><title>502 Bad Gateway</title></head>
## <body>
## <center><h1>502 Bad Gateway</h1></center>
## <hr><center>cloudflare</center>
## </body>
## </html>
## )


# Save the dataset with the explanations:
# ----------------------------------------
# back to the original names
task.rename(columns = {'occupation' : 'job_title'}, inplace = True) 
task.to_excel('../data/task_statements_with_explanations.xlsx', index=False)

3 Results

3.1 Extracting the Results

The results of the exposure scores are stored in the file data/task_statements_with_explanations.xlsx. In the code chunk below, we divide the gpt_responses into four columns: task_description, ai_exposure, explanation and certainty.

# Cleaning the results:
# ---------------------
results = readxl::read_excel("../data/task_statements_with_explanations.xlsx") |> 
  # capitalize on the fact that each piece is separated by two new lines and 
  # starts with number followed by a colon and a space
  dplyr::mutate(
    task_description = stringr::str_extract(gpt_response, "1:.*\n\n2:") |> stringr::str_remove("1: ") |> stringr::str_remove("\n\n2:"),
    ai_exposure = stringr::str_extract(gpt_response, "3:.*\n\n4:") |> stringr::str_remove("3: ") |> stringr::str_remove("\n\n4:"),
    explanation = stringr::str_extract(gpt_response, "2:.*\n\n3:") |> stringr::str_remove("2: ") |> stringr::str_remove("\n\n3:"),
    certainty = stringr::str_extract(gpt_response, "4:.*$") |> stringr::str_remove("4: ") |> 
      # remove text that starts with '. The reasoning' and ends with '.'
      stringr::str_remove("\\. The reasoning.*\\.") |> 
      # similarly, remove text that starts with '. The assessment' and ends with '.'
      stringr::str_remove("\\. The assessment.*\\.")
  )

# Save the results:
# -----------------
readr::write_csv(results, "../data/task_statements_with_formatted_explanations.csv")

# Display the results:
# --------------------
DT::datatable(
  results |> dplyr::select(job_title, count, task_description, ai_exposure, explanation, certainty), 
  rownames = FALSE, 
  extensions = 'Buttons',
  options = list(
    pageLength = 15, scrollX = TRUE, autoWidth = TRUE,
    dom = 'Bfrtip',
    buttons = c('copy', 'csv', 'excel', 'pdf', 'print') 
    )
)

3.2 Summary of the Results

3.2.1 Statistical Summaries

To summarize the results, we will first provide some summary statistics that capture the distribution of the exposure scores across the job titles. Then, we will provide a brief description of the exposure scores and the certainty of the estimates.

# Summary Statistics:
# --------------------
summary_results = results |> 
  dplyr::group_by(ai_exposure) |> 
  dplyr::summarize(
    num_job_titles = dplyr::n(),
    num_jobs = sum(count),
    mode_certainty = VIM::maxCat(certainty)
  )

# Display the summary statistics:
# -------------------------------
DT::datatable(
  summary_results, 
  rownames = FALSE, 
  extensions = 'Buttons',
  options = list(
    pageLength = 15, scrollX = TRUE, autoWidth = TRUE,
    dom = 'Bfrtip',
    buttons = c('copy', 'csv', 'excel', 'pdf', 'print') 
    )
)

3.2.2 Tree Map of the Exposure Scores

To visualize the distribution of the exposure scores across the job titles, we will use a tree map. The tree map will provide a visual representation of the exposure scores and the number of job titles in each category.

treemap::treemap(
  results,
  index = c("ai_exposure", "job_title"),
  vSize = "count",
  vColor = "ai_exposure",
  palette = c("#B61E2E", "#828282", "#FFFFFF"),
  fontsize.labels=c(24,12),
  fontface.labels=c(2,2),
  border.col=c("black","#2F4F4F"), 
  border.lwds=c(7,2),
  align.labels=list(
    c("left", "top"), 
    c("center", "center")
  ),  
  overlap.labels=1, 
  inflate.labels=F,  
  title="AI Exposure Scores by Job Title: Area Proportional to the Number of Jobs",
  position.legend = 'none',
  force.print.labels = FALSE,
  draw=TRUE
)

---
title: "Preliminary Analysis for AI/ML ODOT"
author: "Fadel M. Megahed"
date: "2024-03-02"
output: 
  html_document:
    code_folding: show
    code_download: true
    toc: true
    toc_depth: 2
    toc_float: true
    number_sections: true
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, cache = TRUE)
```

# Getting the Data

To get information about the job titles and salaries, I downloaded the data from the [Ohio Checkbook](https://checkbook.ohio.gov/Salaries/State.aspx?linkFrom=State%20Popular%20Searches) on March 1, 2024 at 3:50 PM. The data is stored in the file `data/EmployeePayroll.csv`. In the code chunk below, I:   

  - read the data and cleaned it up;  
  - counted the number of employees in each job title; and then  
  - saved the result in `data/salaries_tbl.csv`. 

Note that there is a total of **221 unique job titles** at the Ohio Department of Transportation, based on the data downloaded from the [Ohio Checkbook](https://checkbook.ohio.gov/Salaries/State.aspx?linkFrom=State%20Popular%20Searches) on March 1, 2024 at 3:50 PM. 

```{r get_data}
salaries = readr::read_csv("../data/EmployeePayroll.csv") |> 
  janitor::clean_names()
  
salaries = salaries |>
  dplyr::mutate(
    max_hourly_rate = stringr::str_replace_all(max_hourly_rate, "\\$|,", "") |> as.numeric(),
    amount = stringr::str_replace_all(amount, "\\$|,", "") |> as.numeric()
  )

position_tables = table(salaries$job_title)

salaries_tbl = tibble::tibble(
  index = 1:length(position_tables),
  job_title = names(position_tables),
  count = position_tables |> as.integer()
)

readr::write_csv(salaries_tbl, "../data/salaries_tbl.csv")

DT::datatable(
  salaries_tbl |> dplyr::select(job_title, count), 
  rownames = FALSE, 
  extensions = 'Buttons',
  options = list(
    pageLength = 15, scrollX = TRUE, autoWidth = TRUE,
    dom = 'Bfrtip',
    buttons = c('copy', 'csv', 'excel', 'pdf', 'print') 
    )
)
```


# Projected Tasks for Each Job Title and how they would be Impacted by AI

## Using OpenAI API to Generate Task Descriptions and Exposure Scores

Due to the lack of Ohio Department of Transportation (ODOT) data in the proposal stage, we will utilize the OpenAI API to perform the following four tasks for each job title in the dataset:   

  1. Generate a 2-3 sentence description of a singular main task performed by the person with this title given that they work for the Ohio Department of Transportation. While they may be performing many tasks, provide the task that would be most representative of that job title.  
  2. Reason step by step to decide which of the labels (E0/E1/E2/E3) from the exposure rubric you were given applies to the task’s exposure to LLM. Report. Give an explanation for which label applies and report the label that you think fits best. Do not say zero or N/A. Note that the labels correspond to:    
      + **E0:** Used for tasks where LLMs and related tools don't reduce completion time by half, including physically interactive tasks, precise measurement, detailed visual review, and tasks requiring legal human execution.    
      + **E1:** Used for tasks where direct access to an LLM like ChatGPT halves completion time, including text/code writing, editing, summarizing documents, answering questions, email responses, and creating training materials.  
      + **E2:** Used for tasks where LLMs don't reduce completion time by half, but additional software could be developed to do so, including summarizing long documents, retrieving facts, searching knowledge, making recommendations, and maintaining complex databases.  
      + **E3:** Used for tasks where LLMs and image-related tools reduce completion time by half, including reading text from PDFs, scanning images, and creating/editing digital images.  
  3. Report only the label that you determined for the task, which should match the label in step 1. Do not reply N/A.  
  4. Given the amount of speculation required in step 1, describe your certainty about the estimate–either high, moderate, or low. Ensure that your response is consistent with the provided descriptions, seperating each of the four things with two new lines. Each of the things must start with its number followed by a colon and a space. For example, '1: This is the first thing.' Do not return * as part of your response. Stick to the guidelines as the response will be parsed into four seperate columns.  
  
Note that our approach builds on the work of [Eisfeldt et al. (2023). Generative AI and Firm Values](https://www.nber.org/system/files/working_papers/w31222/w31222.pdf). There are four main differences between our work for this proposal and the original paper:  
  - **Conceptual/Design:**  
    + We did not have task descriptions for each job title, so we also generated them using the OpenAI API. 
    + We focused on the main task performed by the person with a specific job title. If we are **funded***, we will assess multiple task descriptions for a given role, and we will ask ODOT to provide these task descriptions.  
  - **Technical:**  
    + We used the latest OpenAI model, GPT-4 Turbo, to generate the exposure scores.  
    + We also used the ChatCompletion API instead of the legacy completions API.  
    

```{python get_exposure, cache=TRUE}
# Loading the required libraries:
# ------------------------------
import time

import pandas as pd
import datetime as dt
import numpy as np
import openai # pip install openai==0.28

from dotenv import load_dotenv


# Loading the environment variable:
# ---------------------------------
load_dotenv() # loads the OpenAI key from the .env file
openai.api_key = os.getenv('OPENAI_API_KEY')


# Model to be used:
# -----------------
model_name = 'gpt-4-turbo-preview' # latest_model_name


# Task Data and Rubric:
# ---------------------
task = pd.read_csv('../data/salaries_tbl.csv') # read the entire dataset
task.rename(columns = {'job_title': 'occupation'}, inplace = True) # to reduce changes from paper below

# AI Firm Exposure Analysis based on
# https://www.nber.org/system/files/working_papers/w31222/w31222.pdf :
# ----------------------------------------------------------------------

rubric = {
    'E1': 'Label tasks E1 if direct access to the LLM through an interface like ChatGPT or the OpenAI playground alone can reduce the time it takes to complete the task with equivalent quality by at least half. This includes tasks that can be reduced to: - Writing and transforming text and code according to complex instructions, - Providing edits to existing text or code following specifications, - Writing code that can help perform a task that used to be done by hand, - Translating text between languages, - Summarizing medium-length documents, - Providing feedback on documents, - Answering questions about a document, - Generating questions a user might want to ask about a document, - Writing questions for an interview or assessment, - Writing and responding to emails, including ones that involve refuting information or engaging in a negotiation (but only if the negotiation is via written correspondence), - Maintain records of written data, - Prepare training materials based on general knowledge, or - Inform anyone of any information via any written or spoken medium.',
    'E2': 'Label tasks E2 if having access to the LLM alone may not reduce the time it takes to complete the task by at least half, but it is easy to imagine additional software that could be developed on top of the LLM that would reduce the time it takes to complete the task by half. This software may include capabilities such as: - Summarizing documents longer than 100,000 words and answering questions about those documents, - Retrieving up-to-date facts from the Internet and using those facts in combination with the LLM capabilities, - Searching over an organization’s existing knowledge, data, or documents and retrieving information, - Retrieving highly specialized domain knowledge, - Make recommendations given data or written input, - Analyze written information to inform decisions, - Prepare training materials based on highly specialized knowledge, - Provide counsel on issues, and - Maintain complex databases.',
    'E3': 'Suppose you had access to both the LLM and a system that could view, caption, and create images as well as any systems powered by the LLM (those in E2 above). This system cannot take video as an input and it cannot produce video as an output. This system cannot accurately retrieve very detailed information from image inputs, such as measurements of dimensions within an image. Label tasks as E3 if there is a significant reduction in the time it takes to complete the task given access to a LLM and these image capabilities: - Reading text from PDFs, - Scanning images, or - Creating or editing digital images according to instructions. The images can be realistic but they should not be detailed. The model can identify objects in the image but not relationships between those options',
    'E0': 'Label tasks E0 if none of the above clearly decrease the time it takes for an experienced worker to complete the task with high quality by at least half. Some examples: - If a task requires a high degree of human interaction (for example, in-person demonstrations) then it should be classified as E0. - If a task requires precise measurements then it should be classified as E0. - If a task requires reviewing visuals in detail then it should be classified as E0. - If a task requires any use of a hand or walking then it should be classified as E0. - Tools built on top of the LLM cannot make any decisions that might impact human livelihood (e.g.hiring, grading, etc.). If any part of the task involves collecting inputs to make a final decision (as opposed to analyzing data to inform a decision or make a recommendation) then it should be classified as E0. The LLM can make recommendations. - Even if tools built on top of the LLM can do a task, if using those tools would not save an experienced worker significant time completing the task, then it should be classified as E0. - The LLM and systems built on top of it cannot do anything that legally requires a human to perform the task. - If there is existing technology not powered by an LLM that is commonly used and can complete the task then you should mark the task E0 if using an LLM or LLM-powered tool will not further reduce the time to complete the task. When in doubt, you should default to E0.'
}

# Slightly Modified System Prompt:
# -----------------------------------
system_prompt = f"""
Consider the most powerful OpenAI large language model (LLM). This model can complete many tasks that can be formulated as having text input and text output where the context for the input can be captured in 100,000 words. The model also cannot draw up-to-date facts (those from <1 year ago) unless they are captured in the input. Assume you are a worker with an average level of expertise in your role trying to complete the given task. You have access to the LLM as well as any other existing software or computer hardware tools mentioned in the task. You also have access to any commonly available technical tools accessible via a laptop (e.g. a microphone, speakers, etc.). You do not have access to any other physical tools or materials. You are a helpful research assistant who wants to label the given tasks according to the rubric below. Equivalent quality means someone reviewing the work would not be able to tell whether a human completed it on their own or with assistance from the LLM. If you aren’t sure how to judge the amount of time a task takes, consider whether the tools described exposed the majority of subtasks associated with the task.

Rubric:
{rubric}

Your role:
You will be provided with an occupation (job title). Then, you must do four things. 1: Generate a 2-3 sentence description of a singular main task performed by the person with this title given that they work for the Ohio Department of Transportation. While they may be performing many tasks, provide the task that would be most representative of that job title. 2: Reason step by step to decide which of the labels (E0/E1/E2/E3) from the exposure rubric you were given applies to the task’s exposure to LLM. Report. Give an explanation for which label applies and report the label that you think fits best. Do not say zero or N/A. 3: Report only the label that you determined for the task, which should match the label in step 1. Do not reply N/A. 4: Given the amount of speculation required in step 1, describe your certainty about the estimate–either high, moderate, or low. Ensure that your response is consistent with the provided descriptions, seperating each of the four things with two new lines. Each of the things must start with its number followed by a colon and a space. For example, '1: This is the first thing.' Do not return * as part of your response. Stick to the guidelines as the response will be parsed into four seperate columns.
"""


# Function to Score the Exposure:
# -------------------------------
def score_exposure_and_explanation(occupation, rubric, system_prompt):
  
  prompt = f"\nOccupation: {occupation}\nRubric:{rubric}\nSystem Prompt{system_prompt}"
  
  # updated the response to use the ChatCompletion API instead of the legacy
  # completions API (that is why the previous code did not work)
  response = openai.ChatCompletion.create(
    model=model_name, 
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt}
    ],
    temperature=0,
    max_tokens=500
    )
  
  # note also the explanation is now a bit different
  explanation = response.choices[0].message['content'].strip()
  
  return explanation

# Iterate through each occupation in our dataset and score the exposure:
# ----------------------------------------------------------------------
for index, row in task.iterrows():
    time.sleep(2) # to avoid rate limiting
    
    occupation = row['occupation']

   # Score the exposure level using the defined function
    try:
        gpt_response = score_exposure_and_explanation(occupation, rubric, system_prompt)
    except Exception as e:
        print(f"Error occurred while generating explanation for occupation '{occupation}': {str(e)}")
        gpt_response = "Error occurred while generating explanation"
    
    # Store the model, timestamp, explanation in the dataset
    task.at[index, 'gpt_model'] = model_name
    task.at[index, 'gpt_response_timestamp'] = dt.datetime.now()
    task.at[index, 'gpt_response'] = gpt_response


# Save the dataset with the explanations:
# ----------------------------------------
# back to the original names
task.rename(columns = {'occupation' : 'job_title'}, inplace = True) 
task.to_excel('../data/task_statements_with_explanations.xlsx', index=False)
```


# Results

## Extracting the Results

The results of the exposure scores are stored in the file `data/task_statements_with_explanations.xlsx`. In the code chunk below, we divide the `gpt_responses` into four columns: `task_description`, `ai_exposure`, `explanation` and `certainty`. 


```{r get_results}
# Cleaning the results:
# ---------------------
results = readxl::read_excel("../data/task_statements_with_explanations.xlsx") |> 
  # capitalize on the fact that each piece is separated by two new lines and 
  # starts with number followed by a colon and a space
  dplyr::mutate(
    task_description = stringr::str_extract(gpt_response, "1:.*\n\n2:") |> stringr::str_remove("1: ") |> stringr::str_remove("\n\n2:"),
    ai_exposure = stringr::str_extract(gpt_response, "3:.*\n\n4:") |> stringr::str_remove("3: ") |> stringr::str_remove("\n\n4:"),
    explanation = stringr::str_extract(gpt_response, "2:.*\n\n3:") |> stringr::str_remove("2: ") |> stringr::str_remove("\n\n3:"),
    certainty = stringr::str_extract(gpt_response, "4:.*$") |> stringr::str_remove("4: ") |> 
      # remove text that starts with '. The reasoning' and ends with '.'
      stringr::str_remove("\\. The reasoning.*\\.") |> 
      # similarly, remove text that starts with '. The assessment' and ends with '.'
      stringr::str_remove("\\. The assessment.*\\.")
  )

# Save the results:
# -----------------
readr::write_csv(results, "../data/task_statements_with_formatted_explanations.csv")

# Display the results:
# --------------------
DT::datatable(
  results |> dplyr::select(job_title, count, task_description, ai_exposure, explanation, certainty), 
  rownames = FALSE, 
  extensions = 'Buttons',
  options = list(
    pageLength = 15, scrollX = TRUE, autoWidth = TRUE,
    dom = 'Bfrtip',
    buttons = c('copy', 'csv', 'excel', 'pdf', 'print') 
    )
)
```

## Summary of the Results

### Statistical Summaries
To summarize the results, we will first provide some summary statistics that capture the distribution of the exposure scores across the job titles. Then, we will provide a brief description of the exposure scores and the certainty of the estimates. 

```{r summary_results}
# Summary Statistics:
# --------------------
summary_results = results |> 
  dplyr::group_by(ai_exposure) |> 
  dplyr::summarize(
    num_job_titles = dplyr::n(),
    num_jobs = sum(count),
    mode_certainty = VIM::maxCat(certainty)
  )

# Display the summary statistics:
# -------------------------------
DT::datatable(
  summary_results, 
  rownames = FALSE, 
  extensions = 'Buttons',
  options = list(
    pageLength = 15, scrollX = TRUE, autoWidth = TRUE,
    dom = 'Bfrtip',
    buttons = c('copy', 'csv', 'excel', 'pdf', 'print') 
    )
)
```

### Tree Map of the Exposure Scores

To visualize the distribution of the exposure scores across the job titles, we will use a tree map. The tree map will provide a visual representation of the exposure scores and the number of job titles in each category. 

```{r tree_map, fig.retina=2, fig.width=12, fig.height = 10, dpi = 900, out.width = "100%"}
treemap::treemap(
  results,
  index = c("ai_exposure", "job_title"),
  vSize = "count",
  vColor = "ai_exposure",
  palette = c("#B61E2E", "#828282", "#FFFFFF"),
  fontsize.labels=c(24,12),
  fontface.labels=c(2,2),
  border.col=c("black","#2F4F4F"), 
  border.lwds=c(7,2),
  align.labels=list(
    c("left", "top"), 
    c("center", "center")
  ),  
  overlap.labels=1, 
  inflate.labels=F,  
  title="AI Exposure Scores by Job Title: Area Proportional to the Number of Jobs",
  position.legend = 'none',
  force.print.labels = FALSE,
  draw=TRUE
)
```

